Brain decoding with SVM
Contents
Brain decoding with SVM¶
Support vector machines¶
Fig. 4 A SVM aims at finding an optimal hyperplane to separate two classes in high-dimensional space, while maximizing the margin. Image from the scikit-learn SVM documentation under BSD 3-Clause license.¶
We are going to train a support vector machine (SVM) classifier for brain decoding on the Haxby dataset. SVM is often successful in high dimensional spaces, and it is a popular technique in neuroimaging.
In the SVM algorithm, we plot each data item as a point in N-dimensional space that N depends on the number of features that distinctly classify the data points (e.g. when the number of features is 3 the hyperplane becomes a two-dimensional plane.). The objective here is finding a hyperplane (decision boundaries that help classify the data points) with the maximum margin (i.e the maximum distance between data points of both classes). Data points falling on either side of the hyperplane can be attributed to different classes.
The scikit-learn documentation contains a detailed description of different variants of SVM, as well as example of applications with simple datasets.
Getting the data¶
We are going to download the dataset from Haxby and colleagues (2001) [HGF+01]. You can check section An overview of the Haxby dataset for more details on that dataset. Here we are going to quickly download it, and prepare it for machine learning applications with a set of predictive variable, the brain time series X, and a dependent variable, the annotation on cognition y.
import os
import warnings
warnings.filterwarnings(action='once')
from nilearn import datasets
# We are fetching the data for subject 4
data_dir = os.path.join('..', 'data')
sub_no = 4
haxby_dataset = datasets.fetch_haxby(subjects=[sub_no], fetch_stimuli=True, data_dir=data_dir)
func_file = haxby_dataset.func[0]
# mask the data
from nilearn.input_data import NiftiMasker
mask_filename = haxby_dataset.mask_vt[0]
masker = NiftiMasker(mask_img=mask_filename, standardize=True, detrend=True)
X = masker.fit_transform(func_file)
# cognitive annotations
import pandas as pd
behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=' ')
y = behavioral['labels']
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sklearn/utils/multiclass.py:14: DeprecationWarning: Please use `spmatrix` from the `scipy.sparse` namespace, the `scipy.sparse.base` namespace is deprecated.
from scipy.sparse.base import spmatrix
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sklearn/utils/optimize.py:18: DeprecationWarning: Please use `line_search_wolfe2` from the `scipy.optimize` namespace, the `scipy.optimize.linesearch` namespace is deprecated.
from scipy.optimize.linesearch import line_search_wolfe2, line_search_wolfe1
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sklearn/utils/optimize.py:18: DeprecationWarning: Please use `line_search_wolfe1` from the `scipy.optimize` namespace, the `scipy.optimize.linesearch` namespace is deprecated.
from scipy.optimize.linesearch import line_search_wolfe2, line_search_wolfe1
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/datasets/func.py:20: DeprecationWarning: Please use `MatReadError` from the `scipy.io.matlab` namespace, the `scipy.io.matlab.miobase` namespace is deprecated.
from scipy.io.matlab.miobase import MatReadError
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/datasets/__init__.py:93: FutureWarning: Fetchers from the nilearn.datasets module will be updated in version 0.9 to return python strings instead of bytes and Pandas dataframes instead of Numpy arrays.
warn("Fetchers from the nilearn.datasets module will be "
Let’s check the size of X and y:
categories = y.unique()
print(categories)
print(y.shape)
print(X.shape)
['rest' 'face' 'chair' 'scissors' 'shoe' 'scrambledpix' 'house' 'cat'
'bottle']
(1452,)
(1452, 675)
So we have 1452 time points, with one cognitive annotations each, and for each time point we have recordings of fMRI activity across 675 voxels. We can also see that the cognitive annotations span 9 different categories.
Training a model¶
We are going to start by splitting our dataset between train and test. We will keep 20% of the time points as test, and then set up a 10 fold cross validation for training/validation.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
Now we can initialize a SVM classifier, and train it:
from sklearn.svm import SVC
model_svm = SVC(random_state=0, kernel='linear', C=1)
model_svm.fit(X_train, y_train)
SVC(C=1, kernel='linear', random_state=0)
Assessing performance¶
Let’s check the accuracy of the prediction on the training set:
from sklearn.metrics import classification_report
y_train_pred = model_svm.predict(X_train)
print(classification_report(y_train, y_train_pred))
precision recall f1-score support
bottle 1.00 1.00 1.00 85
cat 1.00 1.00 1.00 88
chair 1.00 1.00 1.00 90
face 1.00 1.00 1.00 81
house 1.00 1.00 1.00 91
rest 1.00 1.00 1.00 471
scissors 1.00 1.00 1.00 81
scrambledpix 1.00 1.00 1.00 90
shoe 1.00 1.00 1.00 84
accuracy 1.00 1161
macro avg 1.00 1.00 1.00 1161
weighted avg 1.00 1.00 1.00 1161
This is dangerously high. Let’s check on the test set:
y_test_pred = model_svm.predict(X_test)
print(classification_report(y_test, y_test_pred))
precision recall f1-score support
bottle 0.72 0.78 0.75 23
cat 0.67 0.70 0.68 20
chair 0.74 0.78 0.76 18
face 0.89 0.93 0.91 27
house 0.93 0.82 0.87 17
rest 0.91 0.89 0.90 117
scissors 0.83 0.74 0.78 27
scrambledpix 0.85 0.94 0.89 18
shoe 0.72 0.75 0.73 24
accuracy 0.84 291
macro avg 0.81 0.81 0.81 291
weighted avg 0.84 0.84 0.84 291
We can have a look at the confusion matrix:
# confusion matrix
import sys
import numpy as np
from sklearn.metrics import confusion_matrix
sys.path.append('../src')
import visualization
cm_svm = confusion_matrix(y_test, y_test_pred)
model_conf_matrix = cm_svm.astype('float') / cm_svm.sum(axis=1)[:, np.newaxis]
visualization.conf_matrix(model_conf_matrix,
categories,
title='SVM decoding results on Haxby')
Visualizing the weights¶
Finally we can visualize the weights of the (linear) classifier to see which brain region seem to impact most the decision, for example for faces:
from nilearn import plotting
# first row of coef_ is comparing the first pair of class labels
# with 9 classes, there are 9 * 8 / 2 distinct
coef_img = masker.inverse_transform(model_svm.coef_[0, :])
plotting.view_img(
coef_img, bg_img=haxby_dataset.anat[0],
title="SVM weights", dim=-1, resampling_interpolation='nearest'
)
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/scipy/ndimage/_measurements.py:305: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
return _nd_image.find_objects(input, max_label)
And now the easy way¶
We can use the high-level Decoder object from Nilearn. See Decoder object for details. It reduces model specification and fit to two lines of code:
from nilearn.decoding import Decoder
# Specify the classifier to the decoder object.
# With the decoder we can input the masker directly.
# We are using the svc_l1 here because it is intra subject.
#
# cv=5 means that we use 5-fold cross-validation
#
# As a scoring scheme, one can use f1, accuracy or ROC-AUC
#
decoder = Decoder(estimator='svc', cv=5, mask=mask_filename, scoring='f1')
decoder.fit(func_file, y)
That’s it ! We can now look at the results: F1 score and coefficient image:
print('F1 scores')
for category in categories:
print(category, '\t\t {:.2f}'.format(np.mean(decoder.cv_scores_[category])))
plotting.view_img(
decoder.coef_img_['face'], bg_img=haxby_dataset.anat[0],
title="SVM weights for face", dim=-1, resampling_interpolation='nearest'
)
F1 scores
rest 0.80
face 0.30
chair 0.27
scissors 0.25
shoe 0.23
scrambledpix 0.31
house 0.29
cat 0.22
bottle 0.19
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/scipy/ndimage/_measurements.py:305: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
return _nd_image.find_objects(input, max_label)
Note: the Decoder implements a one-vs-all strategy. Note that this is a better choice in general than one-vs-one.
Getting more meaningful weight maps with Frem¶
It is often tempting to interpret regions with high weights as ‘important’ for the prediction task. However, there is no statistical guarantee on these maps. Moreover, they iften do not even exhibit very clear structure. To improve that, a regularization can be brought by using the so-called Fast Regularized Ensembles of models (FREM), that rely on simple averaging and clustering tools to provide smoother maps, yet with minimal computational overhead.
from nilearn.decoding import FREMClassifier
frem = FREMClassifier(estimator='svc', cv=5, mask=mask_filename, scoring='f1')
frem.fit(func_file, y)
plotting.view_img(
frem.coef_img_['face'], bg_img=haxby_dataset.anat[0],
title="SVM weights for face", dim=-1, resampling_interpolation='nearest'
)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[11], line 3
1 from nilearn.decoding import FREMClassifier
2 frem = FREMClassifier(estimator='svc', cv=5, mask=mask_filename, scoring='f1')
----> 3 frem.fit(func_file, y)
4 plotting.view_img(
5 frem.coef_img_['face'], bg_img=haxby_dataset.anat[0],
6 title="SVM weights for face", dim=-1, resampling_interpolation='nearest'
7 )
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/decoding/decoder.py:526, in _BaseDecoder.fit(self, X, y, groups)
518 warnings.warn(
519 "After clustering and screening, the decoding model will "
520 "be trained only on {} features. ".format(n_final_features)
521 + "Consider raising clustering_percentile or "
522 + "screening_percentile parameters", UserWarning)
524 parallel = Parallel(n_jobs=self.n_jobs, verbose=2 * self.verbose)
--> 526 parallel_fit_outputs = parallel(
527 delayed(self._cache(_parallel_fit))(
528 estimator=self.estimator,
529 X=X, y=y[:, c], train=train, test=test,
530 param_grid=self.param_grid,
531 is_classification=self.is_classification, selector=selector,
532 scorer=self.scorer_, mask_img=self.mask_img_, class_index=c,
533 clustering_percentile=self.clustering_percentile)
534 for c, (train, test) in itertools.product(
535 range(n_problems), self.cv_))
537 coefs, intercepts = self._fetch_parallel_fit_outputs(
538 parallel_fit_outputs, y, n_problems)
540 # Build the final model (the aggregated one)
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:1085, in Parallel.__call__(self, iterable)
1076 try:
1077 # Only set self._iterating to True if at least a batch
1078 # was dispatched. In particular this covers the edge
(...)
1082 # was very quick and its callback already dispatched all the
1083 # remaining jobs.
1084 self._iterating = False
-> 1085 if self.dispatch_one_batch(iterator):
1086 self._iterating = self._original_iterator is not None
1088 while self.dispatch_one_batch(iterator):
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:901, in Parallel.dispatch_one_batch(self, iterator)
899 return False
900 else:
--> 901 self._dispatch(tasks)
902 return True
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:819, in Parallel._dispatch(self, batch)
817 with self._lock:
818 job_idx = len(self._jobs)
--> 819 job = self._backend.apply_async(batch, callback=cb)
820 # A job can complete so quickly than its callback is
821 # called before we get here, causing self._jobs to
822 # grow. To ensure correct results ordering, .insert is
823 # used (rather than .append) in the following line
824 self._jobs.insert(job_idx, job)
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/_parallel_backends.py:597, in ImmediateResult.__init__(self, batch)
594 def __init__(self, batch):
595 # Don't delay the application, to avoid keeping the input
596 # arguments in memory
--> 597 self.results = batch()
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:288, in BatchedCalls.__call__(self)
284 def __call__(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:288, in <listcomp>(.0)
284 def __call__(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/memory.py:349, in NotMemorizedFunc.__call__(self, *args, **kwargs)
348 def __call__(self, *args, **kwargs):
--> 349 return self.func(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/decoding/decoder.py:175, in _parallel_fit(estimator, X, y, train, test, param_grid, is_classification, selector, scorer, mask_img, class_index, clustering_percentile)
172 n_clusters = int(X_train.shape[1] * clustering_percentile / 100.)
173 clustering = ReNA(mask_img, n_clusters=n_clusters, n_iter=20,
174 threshold=1e-7, scaling=False)
--> 175 X_train = clustering.fit_transform(X_train)
176 X_test = clustering.transform(X_test)
178 do_screening = (X_train.shape[1] > 100) and selector is not None
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sklearn/base.py:699, in TransformerMixin.fit_transform(self, X, y, **fit_params)
695 # non-optimized default implementation; override when a better
696 # method is possible for a given clustering algorithm
697 if y is None:
698 # fit method of arity 1 (unsupervised transformation)
--> 699 return self.fit(X, **fit_params).transform(X)
700 else:
701 # fit method of arity 2 (supervised transformation)
702 return self.fit(X, y, **fit_params).transform(X)
File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/regions/rena_clustering.py:496, in ReNA.fit(self, X, y)
491 raise ValueError("The mask image should be a Niimg-like"
492 "object. Instead a %s object was provided."
493 % type(self.mask_img))
495 if self.memory is None or isinstance(self.memory, str):
--> 496 self.memory_ = Memory(cachedir=self.memory,
497 verbose=max(0, self.verbose - 1))
498 else:
499 self.memory_ = self.memory
TypeError: __init__() got an unexpected keyword argument 'cachedir'
Note that the resulting accuracy is in general slightly higher:
print('F1 scoreswith FREM')
for category in categories:
print(category, '\t\t {:.2f}'.format(np.mean(decoder.cv_scores_[category])))
Exercises¶
What is the most difficult category to decode? Why?
The model seemed to overfit. Can you find a parameter value for
CinSVCsuch that the model does not overfit as much?Try a
'rbf'kernel inSVC. Can you get a better test accuracy than with the'linear'kernel?Try to explore the weights associated with other labels.
Instead of doing a 5-fold cross-validation, on should split the data by runs. Implement a leave-one-run and leave-two-run out cross-validation. For that you will need to access the run information, that is stored in
behavioral[chunks]. You will also need the LeavePGroupOut object of scikit-learn.Try implementing a random forest or k nearest neighbor classifier.
Hard: implement a systematic hyper-parameter optimization using nested cross-validation. Tip: check this scikit-learn tutorial.
Hard: try to account for class imbalance in the dataset.